1 Bone mineral density modelling

  1. Load data and packages
library(mgcViz)
load("data/calcium.rda")

m1 <- qgamV(bmd ~ group + age, data = calcium, qu = 0.5)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5.........done
summary(m1)
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## bmd ~ group + age
## 
## Parametric coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.338254   0.062536   5.409 6.34e-08 ***
## groupP      -0.030854   0.007640  -4.039 5.38e-05 ***
## age          0.049246   0.005189   9.490  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## R-sq.(adj) =  0.215   Deviance explained = 14.7%
## -REML = -566.39  Scale est. = 1         n = 501

Placebo has a negative effect, as one would expect.

  1. Verify if the proportion of observations falling below the fit depends on the subject:
check1D(m1, calcium$person) + l_gridQCheck1D(qu = 0.5)

We have some massive departures from 0.5. We need to include a random effect for subject:

m2 <- qgamV(bmd ~ group + age + s(person, bs = "re"), data = calcium, qu = 0.5)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5........done
check1D(m2, calcium$person) + l_gridQCheck1D(qu = 0.5)

summary(m2)
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## bmd ~ group + age + s(person, bs = "re")
## 
## Parametric coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.3258396  0.0150015  21.720   <2e-16 ***
## groupP      -0.0220675  0.0130393  -1.692   0.0906 .  
## age          0.0506576  0.0009908  51.127   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##             edf Ref.df Chi.sq p-value    
## s(person) 108.5    110  26809  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.956   Deviance explained = 85.9%
## -REML = -1077.5  Scale est. = 1         n = 501

Looks much better, obviously individual differences must be taken into account. However, notice that the effect of group (placebo vs calcium supplement) is much weaker now.

  1. Maybe the effect of age is non-linear, we use a smooth effect here:
m3 <- qgamV(bmd ~ group + s(age) + s(person, bs = "re"), data = calcium, qu = 0.5)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5...........done
print(plot(m3, allTerms = T), pages = 1)

AIC(m2) - AIC(m3)
## [1] 16.43664

Visually the effect of age seems fairly linear, but maybe it is slightly leveling off after ages 12, and it leads to lower AIC.

  1. Verify whether the effect of age is different between groups:
m4 <- qgamV(bmd ~ group + s(age, by = group) + s(person, bs = "re"), data = calcium, qu = 0.5)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5..........done
print(plot(m4, select = 1:2), pages = 1)

Difficult to say by staring at the two effects above, better to plot the difference between the two smooths directly:

plotDiff(sm(m4, 1), sm(m4, 2)) + l_fitLine() + l_ciLine()

There might actually be a difference: the group taking calcium has lower mineral density before 12 years of age, and the difference reverses afterward. The plot seems to suggest that the supplement starts having effect after 6 months of intake, and levels off after 1.5 years.

  1. We can fit this same model to several quantiles and plot all terms
m5 <- mqgamV(bmd ~ group + s(age, by = group) + s(person, bs = "re"), data = calcium, 
             qu = seq(0.2, 0.8, length.out = 5))
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5..........done 
## qu = 0.35..............done 
## qu = 0.65.........done 
## qu = 0.2..................done 
## qu = 0.8.................done
print(plot(m5, allTerms = T), pages = 1)

All effects seem quite stable across quantiles. We can still look at the differences in the age affect by doing for instance

plotDiff(sm(m5[[5]], 1), sm(m5[[5]], 2)) + l_fitLine() + l_ciLine()

2 Self-paced reading latencies for Russian

  1. Load data and create model formula
library(mgcViz);
load("data/russian.rda")

form <- RT ~ ActOrtho + ActTAM + SubjectSpeedUp + PositionInSentence + Trial
  1. Fit a median model
fit <- qgamV(form, data=russian, qu=0.5)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5........done
summary(fit)
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## RT ~ ActOrtho + ActTAM + SubjectSpeedUp + PositionInSentence + 
##     Trial
## 
## Parametric coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          705.868     30.894  22.848  < 2e-16 ***
## ActOrtho              -8.300      6.137  -1.353  0.17620    
## ActTAM                59.548     46.407   1.283  0.19943    
## SubjectSpeedUp     -2778.809    870.924  -3.191  0.00142 ** 
## PositionInSentence   -28.700     14.414  -1.991  0.04647 *  
## Trial                -41.877     13.191  -3.175  0.00150 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## R-sq.(adj) =  0.0217   Deviance explained = 6.36%
## -REML = 4840.8  Scale est. = 1         n = 673

The two neural network learning measures do not seem to be important.

  1. Checking proportion of observations falling below the fit along for each subject:
check1D(fit, x = russian$Subject) + l_gridQCheck1D(qu = 0.5) 

There are massive individuals differences: we need a random effect for subject.

  1. Fit the model which now includes a random effect for Subject:
form <- RT ~ ActOrtho + ActTAM + SubjectSpeedUp + PositionInSentence + Trial + s(Subject, bs = "re")
fit2 <- qgamV(form, data=russian, qu=0.5)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5................done
summary(fit2)
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## RT ~ ActOrtho + ActTAM + SubjectSpeedUp + PositionInSentence + 
##     Trial + s(Subject, bs = "re")
## 
## Parametric coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          758.691     48.924  15.507  < 2e-16 ***
## ActOrtho              -9.336      2.894  -3.226 0.001255 ** 
## ActTAM                41.839     20.512   2.040 0.041372 *  
## SubjectSpeedUp     -2001.326   3095.456  -0.647 0.517932    
## PositionInSentence   -23.763      6.644  -3.577 0.000348 ***
## Trial                -33.424      5.499  -6.078 1.22e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##              edf Ref.df Chi.sq p-value    
## s(Subject) 33.36     34   2196  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.686   Deviance explained = 60.6%
## -REML = 4414.1  Scale est. = 1         n = 673
check1D(fit2, x = russian$Subject) + l_gridQCheck1D(qu = 0.5) 

AIC(fit) - AIC(fit2) 
## [1] 976.7609

The residuals look much better now, and we are getting lower AIC. Notice that now the effects of ActOrtho and ActTAM are significant.

  1. Estimate model for several quantiles:
qus <- seq(0.1, 0.9, length.out = 11)
form <- RT ~ ActOrtho + ActTAM + SubjectSpeedUp + PositionInSentence + Trial + s(Subject, bs = "re")
fit <- mqgamV(form, data=russian, qu = qus)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5................done 
## qu = 0.42.......
## qu = 0.42, log(sigma) = 4.041908 : outer Newton did not converge fully.
## ...done 
## qu = 0.58.........done 
## qu = 0.34........done 
## qu = 0.66........done 
## qu = 0.26.......done 
## qu = 0.74........done 
## qu = 0.82..................done 
## qu = 0.18.........done 
## qu = 0.1.......done 
## qu = 0.9............done
print(plot(fit, allTerms = TRUE), pages = 1)

  1. Refit using smooth effects for all variables:
qus <- seq(0.1, 0.9, length.out = 11)
form <- RT ~ s(ActOrtho) + s(ActTAM) + s(SubjectSpeedUp) + 
             s(PositionInSentence) + s(Trial) + s(Subject, bs = "re")
fit2 <- mqgamV(form, data=russian, qu = qus)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5........done 
## qu = 0.42..........done 
## qu = 0.58............done 
## qu = 0.34............done 
## qu = 0.66...........done 
## qu = 0.26..................done 
## qu = 0.74..........done 
## qu = 0.82................done 
## qu = 0.18...........done 
## qu = 0.1............done 
## qu = 0.9..........done
print(plot(fit2), pages = 2, ask = F)

AIC(fit[[6]]) - AIC(fit2[[6]])
## [1] 42.94835

We achieve lower AIC using the smooth effects. The effect for SubjectSpeedUp seem very non linear for high quantiles, but it is not signiticant:

summary(fit[[11]])
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## RT ~ ActOrtho + ActTAM + SubjectSpeedUp + PositionInSentence + 
##     Trial + s(Subject, bs = "re")
## 
## Parametric coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        1049.061     67.488  15.545  < 2e-16 ***
## ActOrtho            -11.126      6.879  -1.617   0.1058    
## ActTAM               33.271     40.169   0.828   0.4075    
## SubjectSpeedUp     -164.986   3856.468  -0.043   0.9659    
## PositionInSentence  -57.701     13.224  -4.363 1.28e-05 ***
## Trial               -47.100     13.896  -3.390   0.0007 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##              edf Ref.df Chi.sq p-value    
## s(Subject) 31.38     34  918.2  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.571   Deviance explained = 76.6%
## -REML = 4984.5  Scale est. = 1         n = 673

3 C02 modelling

  1. Load packages
library(mgcViz); library(gamair); 
data(co2s)
  1. Plot data
with(co2s, plot(c.month, co2, type="l"))

  1. Fit model for median
b <- qgam(co2~s(c.month, bs="cr", k=100), data=co2s, qu = 0.5, err = 0.1)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5.........done
  1. Use it for prediction
co2plot <- function(co2s,b) {
  fv <- predict(b,data.frame(c.month=1:543,month=c(rep(1:12,45),1:3)),se=TRUE)
  ul <- fv$fit + 2*fv$se
  ll <- fv$fit - 2*fv$se
  with(co2s,plot(c.month,co2,pch=19,cex=.3,col=2,
                 ylim=range(c(ul,ll)),xlim=c(0,550)))
  lines(1:543,fv$fit)
  lines(1:543,ul,lty=2)
  lines(1:543,ll,lty=2)
}

co2plot(co2s,b) ## nonsense predictions - extrapolation artefact

  1. Fit a better model
b1 <- qgam(co2~s(c.month,bs="cr",k=50)+s(month,bs="cc"),data=co2s, qu = 0.5,
           argGam = list(knots=list(month=c(1,13))), err = 0.1)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5........done
  1. Predict again
co2plot(co2s,b1)

This is much better, as the short-term seasonal effect has been separated from long terms smooth terms, allowing longer range extrapolation of slow long range trend.

4 Electricity load forecasting

  1. Load data and create model formula
library(mgcViz)
data("UKload")
form <- NetDemand~s(wM,k=20,bs='cr') + s(wM_s95,k=20,bs='cr') + 
        s(Posan, k=50, bs = "cr") + Dow + s(Trend,k=4,bs='cr') + 
        NetDemand.48 + Holy
  1. Fit model and plot smooths
qu <- 0.5
fit <- qgamV(form = form, data = UKload, qu = qu, err = 0.1)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5...............done
print(plot(fit), pages = 1)

summary(fit)
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## NetDemand ~ s(wM, k = 20, bs = "cr") + s(wM_s95, k = 20, bs = "cr") + 
##     s(Posan, k = 50, bs = "cr") + Dow + s(Trend, k = 4, bs = "cr") + 
##     NetDemand.48 + Holy
## 
## Parametric coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   2.093e+04  5.445e+02   38.45   <2e-16 ***
## Dowjeudi      3.599e+03  9.733e+01   36.97   <2e-16 ***
## Dowlundi      6.149e+03  6.106e+01  100.71   <2e-16 ***
## Dowmardi      3.650e+03  1.009e+02   36.16   <2e-16 ***
## Dowmercredi   3.707e+03  9.639e+01   38.46   <2e-16 ***
## Dowsamedi    -1.667e+03  9.391e+01  -17.75   <2e-16 ***
## Dowvendredi   3.330e+03  9.675e+01   34.41   <2e-16 ***
## NetDemand.48  4.224e-01  1.461e-02   28.92   <2e-16 ***
## Holy1        -3.613e+03  3.056e+02  -11.82   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##              edf Ref.df Chi.sq p-value    
## s(wM)      7.109  8.733 339.95 < 2e-16 ***
## s(wM_s95)  3.581  4.743  18.65 0.00182 ** 
## s(Posan)  23.875 29.239 476.13 < 2e-16 ***
## s(Trend)   2.938  2.997 437.68 < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.946   Deviance explained = 86.4%
## -REML =  16283  Scale est. = 1         n = 2008

The effect of Posan if fairly wiggly and drops sharply in the Christmas period.

  1. Modify model formula and refit
form <- NetDemand~s(wM,k=20,bs='cr') + s(wM_s95,k=20,bs='cr') + 
        s(Posan, bs='ad', k=50) + Dow + s(Trend,k=4) + 
        NetDemand.48 + Holy

fit <- qgamV(form = form, data = UKload, qu = qu, err = 0.1)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5...............done
print(plot(fit), pages = 1)

summary(fit)
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## NetDemand ~ s(wM, k = 20, bs = "cr") + s(wM_s95, k = 20, bs = "cr") + 
##     s(Posan, bs = "ad", k = 50) + Dow + s(Trend, k = 4) + NetDemand.48 + 
##     Holy
## 
## Parametric coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   2.084e+04  5.486e+02   37.99   <2e-16 ***
## Dowjeudi      3.588e+03  9.842e+01   36.46   <2e-16 ***
## Dowlundi      6.152e+03  6.102e+01  100.82   <2e-16 ***
## Dowmardi      3.646e+03  1.016e+02   35.88   <2e-16 ***
## Dowmercredi   3.699e+03  9.796e+01   37.76   <2e-16 ***
## Dowsamedi    -1.678e+03  9.470e+01  -17.72   <2e-16 ***
## Dowvendredi   3.320e+03  9.764e+01   34.00   <2e-16 ***
## NetDemand.48  4.246e-01  1.472e-02   28.85   <2e-16 ***
## Holy1        -3.455e+03  3.007e+02  -11.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##              edf Ref.df Chi.sq p-value    
## s(wM)      7.134  8.762  343.4 < 2e-16 ***
## s(wM_s95)  3.517  4.661   18.2 0.00207 ** 
## s(Posan)  16.241 19.613  476.9 < 2e-16 ***
## s(Trend)   2.940  2.997  430.7 < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.946   Deviance explained = 86.4%
## -REML =  16270  Scale est. = 1         n = 2008

Now the effect of Posan is smoother along most of the year, but it drops around Christmas even more than before. This is because many businesses are shut and people go on holiday during this period. An adaptive basis makes so that we use lots of degrees of freedom where they are needed (winter holiday) and few where the effect is smooth. Alternatively, we could have added a factor for the winter period (although one might point out that we have already included a dummy variable indicating bank holidays).

  1. mqgam fit and plotting effects for each quantile
nqu <- 5
qus <- seq(0.1, 0.9, length.out = nqu)
fitM <- mqgamV(form = form, data = UKload, err = 0.1, qu = qus)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5...............done 
## qu = 0.3............done 
## qu = 0.7............done 
## qu = 0.1...........done 
## qu = 0.9.............done
print(plot(fitM), pages = 1)

Notice that when the effects of low and high quantiles diverge, the conditional variance of the response is increasing (other things being equal). Along wM_s95 we can also see that at low temperatures the load distribution is skewed to the right (again, other things being equal). Looking at the plot for Posan, look at how the Christmas effect changes depending on the quantile. The lowest quantiles, are more strongly effected: they go down and then bounce back. We couldn’t get such insights with a Gaussian GAM!

We can also look at the parametric effects:

print(plot(fitM, allTerms = T, select = 5:7), pages = 1)

It is interesting to notice that the holiday effect is stronger (more negative) on the low quantiles.

  1. Model checking

We consider the third quantile (the median), first we look at the bias caused by smooth the loss:

indx <- 3
check(fitM[[indx]])

## Theor. proportion of neg. resid.: 0.5   Actual proportion: 0.5054781
## Integrated absolute bias |F(mu) - F(mu0)| = 0.01666654
## Method: REML   Optimizer: outer newton
## full convergence after 6 iterations.
## Gradient range [-0.001339629,0.004266669]
## (score 16270.07 & scale 1).
## Hessian positive definite, eigenvalue range [0.0001153833,3.101468].
## Model rank =  99 / 99 
## 
## Basis dimension (k) check: if edf is close too k' (maximum possible edf) 
## it might be worth increasing k. 
## 
##           k'   edf
## s(wM)     19  7.13
## s(wM_s95) 19  3.52
## s(Posan)  49 16.24
## s(Trend)   3  2.94

These checks mostly focus on the fraction of residuals falling below the fitted quantile, which should be close to 0.5 given that we are fitting quantile \(\tau = 0.5\).

We can also verify whether the fraction of points falling below the fit depart too much from \(\tau = 0.5\), along each covariate:

pl <- list()
pl[[1]] <- check1D(fitM[[indx]], x = "Dow") + l_gridQCheck1D(qu = qus[indx]) 
pl[[2]] <- check1D(fitM[[indx]], x = "wM") + l_gridQCheck1D(qu = qus[indx]) 
pl[[3]] <- check1D(fitM[[indx]], x = "wM_s95") + l_gridQCheck1D(qu = qus[indx])
pl[[4]] <- check1D(fitM[[indx]], x = "Posan") + l_gridQCheck1D(qu = qus[indx]) 

# To plot using grid.arrange, we need to extract the ggplot objects
library(gridExtra)
grid.arrange(grobs = lapply(pl, "[[", 1))

Looks good, most departures are within 80 percent confidence bands.

5 Reaction times for Estonian case-inflected nouns

  1. Load data and fit linear model:
library(mgcViz)
load("data/est.rda")

m1 <- qgamV(RT ~ InfFamSize + Age + LogFrequency + WordLength + Trial,
            data=est, qu=0.5)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5.......done
summary(m1)
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## RT ~ InfFamSize + Age + LogFrequency + WordLength + Trial
## 
## Parametric coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  529.15724   25.06661  21.110  < 2e-16 ***
## InfFamSize   -15.75155    4.33022  -3.638 0.000275 ***
## Age            3.89067    0.22211  17.517  < 2e-16 ***
## LogFrequency -16.96570    1.52765 -11.106  < 2e-16 ***
## WordLength    29.94322    1.61262  18.568  < 2e-16 ***
## Trial         -0.00132    0.05082  -0.026 0.979288    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## R-sq.(adj) =  0.211   Deviance explained = 18.7%
## -REML =  36153  Scale est. = 1         n = 5131

Trial does not seem to have a strong (linear) effect.

  1. Look at fraction of observations falling below the fit:
pl <- list()
pl[[1]] <- check1D(m1, x = "Trial") + l_gridQCheck1D(qu = 0.5) 
pl[[2]] <- check1D(m1, x = est$Subject) + l_gridQCheck1D(qu = 0.5) 

# To plot using grid.arrange, we need to extract the ggplot objects
library(gridExtra)
grid.arrange(grobs = lapply(pl, "[[", "ggObj"))

There is a clear non-linear pattern along Trial, and there are massive individual differences (depending on subject).

  1. Add smooth for Trial and a random effect for Subject:
m1 = qgamV(RT ~ InfFamSize + Age + LogFrequency + WordLength + s(Trial) + s(Subject, bs="re"),
           data=est, qu=0.5)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5..........done
pl <- list()
pl[[1]] <- check1D(m1, x = "Trial") + l_gridQCheck1D(qu = 0.5) 
pl[[2]] <- check1D(m1, x = est$Subject) + l_gridQCheck1D(qu = 0.5) 

grid.arrange(grobs = lapply(pl, "[[", "ggObj"))

summary(m1)
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## RT ~ InfFamSize + Age + LogFrequency + WordLength + s(Trial) + 
##     s(Subject, bs = "re")
## 
## Parametric coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   519.787     78.523   6.620 3.60e-11 ***
## InfFamSize    -13.626      3.471  -3.926 8.65e-05 ***
## Age             3.991      1.646   2.425   0.0153 *  
## LogFrequency  -16.088      1.221 -13.171  < 2e-16 ***
## WordLength     30.514      1.266  24.095  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##               edf Ref.df  Chi.sq p-value    
## s(Trial)    5.028  6.137   21.94 0.00137 ** 
## s(Subject) 21.746 23.000 1736.16 < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.393   Deviance explained = 33.2%
## -REML =  35280  Scale est. = 1         n = 5131

There are still some departures from 0.5 along Trial, but there is some improvement and the non-linear effect of Trial seems important. The diagnostic plot along Subject now look very good. Notice that both effects are significant.

  1. Try tensor effect
m1 <- qgamV(RT ~ InfFamSize + Age + te(LogFrequency, WordLength) + s(Trial) + s(Subject, bs="re"),
            data=est, qu=0.5)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5...........done
plotRGL(sm(m1, 1), residuals = T)
## Warning in par3d(userMatrix = structure(c(1, 0, 0, 0, 0,
## 0.342020143325668, : font family "sans" not found, using "bitmap"

The bivariate effect of LogFrequency and WordLength looks pretty linear to me! We are probably better off using two linear effects.

  1. Fit several quantile models
qus <- seq(0.1, 0.9, length.out = 5)
m1 = mqgamV(RT ~ InfFamSize + Age + LogFrequency + WordLength + s(Subject, bs="re") + s(Trial),
            data=est, qu=qus)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5..........done 
## qu = 0.3...............done 
## qu = 0.7..............done 
## qu = 0.1................done 
## qu = 0.9..........done
# Plotting all smooth effects
print(plot(m1), pages = 1)

The effect of Trial shows a fast learning effect, up to around 75 trials, followed by fatigue. It is interesting to notice that the learning effect seems much faster for very slow responses (high RT, quantile 0.9).

We now plot also the parametric effects:

print(plot(m1, allTerms = TRUE, select = 3:6), pages = 1)

Notice that all the confidence intervals get wider as we move toward the highest quantile (0.9). This is normal, as the response times distribution is very skewed to the right, hence the data is quite sparse around high quantile. The effects of word frequency and length get stronger as we look at slower responses.

6 Rainfall modelling in Switzerland

  1. Load data, create model formula and fit median GAM
library(mgcViz);
library(gamair);
data(swer)

form <- exra ~ s(nao) + s(elevation) + climate.region + s(E, N) + s(year, k = 5)
fit <- qgamV(form, data = swer, qu = 0.5)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5...............done
summary(fit)
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## exra ~ s(nao) + s(elevation) + climate.region + s(E, N) + s(year, 
##     k = 5)
## 
## Parametric coefficients:
##                                            Estimate Std. Error z value
## (Intercept)                                  69.488      3.933  17.670
## climate.regionCentral Alpine north slope    -25.306      4.977  -5.084
## climate.regionCentral plateau               -24.225      5.686  -4.260
## climate.regionEastern Alpine north slope    -15.011      5.950  -2.523
## climate.regionEastern Jura                  -21.495      6.358  -3.381
## climate.regionEngadine                      -18.976      3.797  -4.997
## climate.regionNorth-eastern plateau         -14.585      6.341  -2.300
## climate.regionNorthern and central Grisons  -15.968      4.399  -3.630
## climate.regionValais                        -30.228      5.298  -5.705
## climate.regionWestern Alpine north slope    -29.827      5.071  -5.882
## climate.regionWestern Jura                  -21.164      6.833  -3.097
## climate.regionWestern plateau               -21.765      6.399  -3.401
##                                            Pr(>|z|)    
## (Intercept)                                 < 2e-16 ***
## climate.regionCentral Alpine north slope   3.69e-07 ***
## climate.regionCentral plateau              2.04e-05 ***
## climate.regionEastern Alpine north slope   0.011640 *  
## climate.regionEastern Jura                 0.000723 ***
## climate.regionEngadine                     5.82e-07 ***
## climate.regionNorth-eastern plateau        0.021431 *  
## climate.regionNorthern and central Grisons 0.000283 ***
## climate.regionValais                       1.16e-08 ***
## climate.regionWestern Alpine north slope   4.06e-09 ***
## climate.regionWestern Jura                 0.001952 ** 
## climate.regionWestern plateau              0.000671 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                 edf Ref.df Chi.sq  p-value    
## s(nao)        6.278  7.352  25.81 0.000753 ***
## s(elevation)  6.031  6.966  82.76 5.63e-15 ***
## s(E,N)       23.128 26.540 263.31  < 2e-16 ***
## s(year)       1.006  1.012   0.00 0.997441    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.484   Deviance explained = 36.5%
## -REML = 9204.4  Scale est. = 1         n = 2196
print(plot(fit), pages = 1)

  1. Fit a smooth trend for each climate region:
fit2 <- qgamV(exra ~ s(nao) + s(elevation) +
                     s(year, climate.region, bs = "fs", k = 5) + s(E, N), 
                     data = swer, qu = 0.5)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5..........done
summary(fit2)
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## exra ~ s(nao) + s(elevation) + s(year, climate.region, bs = "fs", 
##     k = 5) + s(E, N)
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   49.784      1.828   27.24   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                           edf Ref.df Chi.sq  p-value    
## s(nao)                  6.414  7.465  27.79 0.000366 ***
## s(elevation)            5.930  6.843  94.68  < 2e-16 ***
## s(year,climate.region) 14.803 59.000  55.94 2.91e-11 ***
## s(E,N)                 24.888 27.371 297.46  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.487   Deviance explained = 36.8%
## -REML =   9231  Scale est. = 1         n = 2196
AIC(fit) - AIC(fit2)
## [1] 30.11005
plot(sm(fit2, 3)) + l_fitLine(alpha = 1)

The effect of year-by-region is using relatively few degrees of freedom (edf), there might have been a slight decrease in rainfall in Valais, for instance.

  1. Fit a 3D spatio-temporal tensor product effect:
fit <- qgamV(exra ~ s(nao) + s(elevation) + climate.region +
                    te(E, N, year, d = c(2, 1), k = c(20, 5)), 
                    data = swer, qu = 0.5)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5.............done
plotSlice(x = sm(fit, 3), 
          fix = list("year" = c(1985, 1995, 2005, 2015))) + l_fitRaster() + 
          l_fitContour() + l_points()

There doesn’t seem to be much change in the precipication pattern, maybe there is a decrease of rainfall levels in the South East (in the Canton of Ticino).

  1. We plot two slice through time using rgl:
# These will not appear in the html output
plotRGL(x = sm(fit, 3), fix = c("year" = 1985), residuals = TRUE)
open3d()
## Warning in par3d(userMatrix = structure(c(1, 0, 0, 0, 0,
## 0.342020143325668, : font family "sans" not found, using "bitmap"
## glX 
##   2
plotRGL(x = sm(fit, 3), fix = c("year" = 2015), residuals = TRUE)
  1. Remove the tensor effect w.r.t. time and fit the model to several quantiles:
fitM <- mqgamV(exra ~ s(nao) + s(elevation) + climate.region + s(E, N) + s(year, k = 5), 
               data = swer, qu = seq(0.1, 0.9, length.out = 9) )
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5...............done 
## qu = 0.4..........done 
## qu = 0.6............done 
## qu = 0.3..........done 
## qu = 0.7................done 
## qu = 0.2..........done 
## qu = 0.8............done 
## qu = 0.1......................done 
## qu = 0.9....................done
# Plot univariate smooths
print(plot(fitM, select = c(1, 2, 4)), pages = 1)

summary(fitM[[9]])
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## exra ~ s(nao) + s(elevation) + climate.region + s(E, N) + s(year, 
##     k = 5)
## 
## Parametric coefficients:
##                                            Estimate Std. Error z value
## (Intercept)                                 100.489      8.153  12.326
## climate.regionCentral Alpine north slope    -25.083      9.828  -2.552
## climate.regionCentral plateau               -31.481     11.601  -2.714
## climate.regionEastern Alpine north slope    -13.118     12.944  -1.013
## climate.regionEastern Jura                  -32.119     13.340  -2.408
## climate.regionEngadine                      -23.663      7.265  -3.257
## climate.regionNorth-eastern plateau         -34.823     13.678  -2.546
## climate.regionNorthern and central Grisons  -20.479      8.711  -2.351
## climate.regionValais                        -39.535     11.226  -3.522
## climate.regionWestern Alpine north slope    -35.080      9.990  -3.511
## climate.regionWestern Jura                  -25.283     14.644  -1.727
## climate.regionWestern plateau               -24.701     13.349  -1.850
##                                            Pr(>|z|)    
## (Intercept)                                 < 2e-16 ***
## climate.regionCentral Alpine north slope   0.010707 *  
## climate.regionCentral plateau              0.006654 ** 
## climate.regionEastern Alpine north slope   0.310868    
## climate.regionEastern Jura                 0.016057 *  
## climate.regionEngadine                     0.001126 ** 
## climate.regionNorth-eastern plateau        0.010901 *  
## climate.regionNorthern and central Grisons 0.018728 *  
## climate.regionValais                       0.000429 ***
## climate.regionWestern Alpine north slope   0.000446 ***
## climate.regionWestern Jura                 0.084254 .  
## climate.regionWestern plateau              0.064249 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                 edf Ref.df  Chi.sq  p-value    
## s(nao)        1.002  1.003   0.004    0.951    
## s(elevation)  1.001  1.002  24.296 8.46e-07 ***
## s(E,N)       21.476 25.381 180.876  < 2e-16 ***
## s(year)       1.996  2.446   2.276    0.297    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.388   Deviance explained = 63.3%
## -REML =  11205  Scale est. = 1         n = 2196

It seems that the effect of NAO is very weak for extreme quantiles, however the reason for this might simply be that the data is quite sparse in the tails. Also, the effect of year seems to be highly non-linear for quantile 0.9, but summary shows that this effect is not significant.

We can also plot the spatial effect:

print(plot(fitM, select = 3), pages = 1)

Interestingly the spatial effect is much stronger for high quantiles (extreme rainfall) than for the low ones. For quantile 0.9 the spatial effect varies between +50mm in the Canton of Ticino and -40mm in the Canton of Grisons.

Finally we can see how the climate region effect changes depending on the quantile of interest:

print(plot(fitM, select = 5), pages = 1)